A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity
نویسندگان
چکیده
SGD (Stochastic Gradient Descent) is a popular algorithm for large scale optimization problems due to its low iterative cost. However, SGD can not achieve linear convergence rate as FGD (Full Gradient Descent) because of the inherent gradient variance. To attack the problem, mini-batch SGD was proposed to get a trade-off in terms of convergence rate and iteration cost. In this paper, a general CVI (Convergence-Variance Inequality) equation is presented to state formally the interaction of convergence rate and gradient variance. Then a novel algorithm named SSAG (Stochastic Stratified Average Gradient) is introduced to reduce gradient variance based on two techniques, stratified sampling and averaging over iterations that is a key idea in SAG (Stochastic Average Gradient). Furthermore, SSAG can achieve linear convergence rate of O((1− μ 8CL ) ) at smaller storage and iterative costs, where C ≥ 2 is the category number of training data. This convergence rate depends mainly on the variance between classes, but not on the variance within the classes. In the case of C N (N is the training data size), SSAG’s convergence rate is much better than SAG’s convergence rate of O((1− μ 8NL ) ). Our experimental results show SSAG outperforms SAG and many other algorithms.
منابع مشابه
Trading-off variance and complexity in stochastic gradient descent
Stochastic gradient descent (SGD) is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration. However, it lags behind its non-stochastic counterparts with respect to the convergence rate, due to high variance introduced by the stochastic updates. The popular Stochastic Variance-Reduced Gradient (Svrg) method mitigates this shortcoming, int...
متن کاملMomentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods
In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all o...
متن کاملFast Convergence of Stochastic Gradient Descent under a Strong Growth Condition
We consider optimizing a function smooth convex function f that is the average of a set of differentiable functions fi, under the assumption considered by Solodov [1998] and Tseng [1998] that the norm of each gradient f ′ i is bounded by a linear function of the norm of the average gradient f . We show that under these assumptions the basic stochastic gradient method with a sufficiently-small c...
متن کاملAccelerating Minibatch Stochastic Gradient Descent using Stratified Sampling
Stochastic Gradient Descent (SGD) is a popular optimization method which has been applied to many important machine learning tasks such as Support Vector Machines and Deep Neural Networks. In order to parallelize SGD, minibatch training is often employed. The standard approach is to uniformly sample a minibatch at each step, which often leads to high variance. In this paper we propose a stratif...
متن کاملMinimizing Finite Sums with the Stochastic Average Gradient
We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method’s iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The conver...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.07783 شماره
صفحات -
تاریخ انتشار 2017